Proteins: Structure, Function, and Bioinformatics — Latest Matching Preprints

1

Accurate Stoichiometry Prediction of Protein Complexes by Integrating AlphaFold3 and Template Information

Liu, J.; Neupane, P.; Cheng, J.

2025-01-15 bioinformatics 10.1101/2025.01.12.632663 medRxiv

Top 0.1%

38.4%

Show abstract

Protein structure prediction methods require stoichiometry information (i.e., subunit counts) to predict the quaternary structure of protein complexes. However, this information is often unavailable, making stoichiometry prediction crucial for complexes with unknown stoichiometry. Despite its importance, few computational methods address this challenge. In this study, we present an approach that integrates AlphaFold3 structure predictions with homologous template data to predict stoichiometry. The method generates candidate stoichiometries, builds structural models for them using AlphaFold3, ranks them based on AlphaFold3 scores, and further refine predictions with template-based information when available. In the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our method achieved 71.4% top-1 accuracy and 92.9% top-3 accuracy, outperforming other predictors in terms of the overall performance. This demonstrates the complementary strengths of AlphaFold3- and template-based predictions and highlights its applicability for uncharacterized protein complexes lacking stoichiometry data.

2

Simplified geometric representations of protein structures identify complementary interaction interfaces

McCafferty, C. L.; Marcotte, E. M.; Taylor, D. W.

2019-12-23 bioinformatics 10.1101/2019.12.18.880575 medRxiv

Top 0.1%

37.6%

Show abstract

Protein-protein interactions are critical to protein function, but three-dimensional (3D) arrangements of interacting proteins have proven hard to predict, even given the identities and 3D structures of the interacting partners. Specifically, identifying the relevant pairwise interaction surfaces remains difficult, often relying on shape complementarity with molecular docking while accounting for molecular motions to optimize rigid 3D translations and rotations. However, such approaches can be computationally expensive, and faster, less accurate approximations may prove useful for large-scale prediction and assembly of 3D structures of multi-protein complexes. We asked if a reduced representation of protein geometry retains enough information about molecular properties to predict pairwise protein interaction interfaces that are tolerant of limited structural rearrangements. Here, we describe a cuboid transformation of 3D protein accessible surfaces on which molecular properties such as charge, hydrophobicity, and mutation rate can be easily mapped, implemented in the MorphProt package. Pairs of surfaces are compared to rapidly assess partner-specific potential surface complementarity. On two available benchmarks of 85 overall known protein complexes, we observed F1 scores (a weighted combination of precision and recall) of 19-34% at correctly identifying protein interaction surfaces, comparable to more computationally intensive 3D docking methods in the annual Critical Assessment of PRedicted Interactions. Furthermore, we examined the effect of molecular motion through normal mode simulation on a benchmark receptor-ligand pair and observed no marked loss of predictive accuracy for distortions of up to 6 [A] RMSD. Thus, a cuboid transformation of protein surfaces retains considerable information about surface complementarity, offers enhanced speed of comparison relative to more complex geometric representations, and exhibits tolerance to conformational changes.

3

Alternating handedness motifs in proteins classify structure and cofactor binding

Rizwan, S.; Pike, D.; Poudel, S.; Nanda, V.

2020-11-20 bioinformatics 10.1101/2020.11.17.367490 medRxiv

Top 0.1%

35.4%

Show abstract

Cofactor binding sites in proteins often are composed of favorable interactions of specific cofactors with the sidechains and/or backbone protein fold motifs. In many cases these motifs contain left-handed conformations which enable tight turns of the backbone that present backbone amide protons in direct interactions with cofactors termed cationic nests. Here, we defined alternating handedness of secondary structure as a search constraint within the PDB to systematically identify these cofactor binding nests. We identify unique alternating handedness structural motifs which are specific to the cofactors they bind. These motifs can guide the design of engineered folds that utilize specific cofactors and also enable us to gain a deeper insight into the evolution of the structure of cofactor binding sites.

4

AFM-RL: Large Protein Complex Docking Using AlphaFold-Multimer and Reinforcement Learning

Aderinwale, T.; Jahandideh, R.; Zhang, Z.; Zhao, B.; Xiong, Y.; Kihara, D.

2024-01-23 bioinformatics 10.1101/2024.01.20.576386 medRxiv

Top 0.1%

34.4%

Show abstract

Various biological processes in living cells are carried out by protein complexes, whose interactions can span across multiple protein structures. To understand the molecular mechanisms of such processes, it is crucial to know the quaternary structures of these complexes. Although the structures of many protein complexes have been determined through biophysical experiments, there are still many important complex structures that are yet to be determined, particularly for large complexes with multiple chains. To supplement experimental structure determination, many computational protein docking methods have been developed, but most are limited to two chains, and few are designed for three chains or more. We have previously developed a method, RL-MLZerD, for multiple protein docking, which was applied to complexes with three to five chains. Here, we expand the ability of this method to predict the structures of large protein complexes with six to twenty chains. We use AlphaFold-Multimer (AFM) to predict pairwise models and then assemble them using our reinforcement learning framework. Our new method, AFM-RL, can predict a diverse set of pairwise models, which aids the RL assembly steps for large protein complexes. Additionally, AFM-RL demonstrates improved modeling performance when compared to existing methods for large protein complex docking.

5

Deep learning enables the design of functional de novo antimicrobial proteins

Caceres-Delpiano, J.; Ibanez, R.; Alegre, P.; Sanhueza, C.; Paz, R.; Correa, S.; Retamal, P.; Jimenez, J. C.; Alvarez, L.

2020-08-26 bioengineering 10.1101/2020.08.26.266940 medRxiv

Top 0.1%

34.2%

Show abstract

Protein sequences are highly dimensional and present one of the main problems for the optimization and study of sequence-structure relations. The intrinsic degeneration of protein sequences is hard to follow, but the continued discovery of new protein structures has shown that there is convergence in terms of the possible folds that proteins can adopt, such that proteins with sequence identities lower than 30% may still fold into similar structures. Given that proteins share a set of conserved structural motifs, machine-learning algorithms can play an essential role in the study of sequence-structure relations. Deep-learning neural networks are becoming an important tool in the development of new techniques, such as protein modeling and design, and they continue to gain power as new algorithms are developed and as increasing amounts of data are released every day. Here, we trained a deep-learning model based on previous recurrent neural networks to design analog protein structures using representations learning based on the evolutionary and structural information of proteins. We test the capabilities of this model by creating de novo variants of an antifungal peptide, with sequence identities of 50% or lower relative to the wild-type (WT) peptide. We show by in silico approximations, such as molecular dynamics, that the new variants and the WT peptide can successfully bind to a chitin surface with comparable relative binding energies. These results are supported by in vitro assays, where the de novo designed peptides showed antifungal activity that equaled or exceeded the WT peptide.

6

Benchmarking Peptide Structure Prediction with AlphaFold2

Gulsevin, A.; Meiler, J.

2022-02-17 biophysics 10.1101/2022.02.17.480937 medRxiv

Top 0.1%

30.8%

Show abstract

AlphaFold2 (AF2) is a computational tool developed for the determination of protein structures with high accuracy. AF2 has been used for the modeling of many soluble and membrane proteins, but its performance in modeling peptide structures has not been systematically investigated so far. We benchmarked the accuracy of AF2 in predicting peptide structures between 16 - 60 amino acids using experimentally-determined peptide structures as reference. Our results show that while AF2 can predict the structures of certain peptide scaffolds with RMSD values below 3 [A], it is less successful in predicting the structures of peptides that have kinks, turns, or have extended flexible regions. Further, AF2 had several shortcomings in predicting rotamer recoveries, disulfide bonds, and the lowest RMSD structures based on pLDDT values. In summary, AF2 can be a powerful tool to determine peptide structures, but additional steps may be necessary to analyze and validate the results.

7

The Influence of Ligands on AlphaFold3 Prediction of Cryptic Pockets

Lazou, M.; Tuchscherer, F.; Vajda, S.; Joseph-McCarthy, D.

2026-01-04 bioinformatics 10.64898/2026.01.04.697564 medRxiv

Top 0.1%

30.5%

Show abstract

Cryptic pockets are binding sites that are formed or exposed upon a conformational change. They represent an important class of potentially druggable binding sites. Reliably predicting cryptic pockets capable of binding ligands, however, remains a challenge. Herein we examine the use of AlphaFold 3 (AF3) for generating realistic conformational ensembles that include known cryptic pockets. We find that AF3 is generally able to reproduce the scale of conformational change required for cryptic site formation. When given a cryptic-site ligand for the protein, AF3 predominantly predicts conformations competent to bind the ligand in the cryptic site; without the ligand, conformations lacking the cryptic pocket generally dominate. While the results may reflect a bias toward memorized structural priors, the level of detrimental memorization appears to be limited. We also show that the choice of the ligand can significantly impact the predictions, and that AF3 is able to produce models with the ligand correctly positioned. Variability in ligand position, however, suggests that generating ensembles of co-folded predictions is critical to enhancing the likelihood of obtaining a correct binding mode. Overall, AF3-generated protein-ligand structural ensembles have potential utility in cryptic-site drug discovery, and they can reveal ligands likely to bind to those sites.

8

Flanking Domains Modulate α-Synuclein Monomer Structure: A Molecular Dynamics Domain Deletion Study

Onishi, N.; Mazzaferro, N.; Kunstelj, S.; Alvarado, D.; Muller, A.; Vazquez, F. X.

2024-03-27 biophysics 10.1101/2024.03.23.586267 medRxiv

Top 0.1%

30.4%

Show abstract

Aggregates of misfolded -synuclein proteins (asyn) are key markers of Parkinsons disease. Asyn proteins have three domains: an N-terminal domain, a hydrophobic NAC core implicated in aggregation, and a proline-rich C-terminal domain. Proteins with truncated C-terminal domains are known to be prone to aggregation and suggest that understanding domain-domain interactions in asyn monomers could help elucidate the role of the flanking domains in modulating protein structure. To this end, we used Gaussian accelerated molecular dynamics (GAMD) to simulate wild-type (WT), N-terminal truncated ({Delta}N), C-terminal truncated ({Delta}C), and isolated NAC domain asyn protein variants (isoNAC). Using clustering and contact analysis, we found that removal of the N-terminal domain led to increased contacts between NAC and C-terminal domains and the formation of interdomain {Delta}-sheets. Removal of either flanking domain also resulted in increased compactness of every domain. We also found that the contacts between flanking domains in the WT protein result in an electrostatic potential (ESP) that may lead to favorable interactions with anionic lipid membranes. Removal of the C-terminal domain disrupts the ESP in a way that could result in over-stabilized protein-membrane interactions. These results suggests that cooperation between the flanking domains may modulate the proteins structure in a way that helps maintain elongation and creates an ESP that may aid favorable interactions with the membrane.

9

Modeling of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Proteins by Machine Learning and Physics-Based Refinement

Heo, L.; Feig, M.

2020-03-28 biophysics 10.1101/2020.03.25.008904 medRxiv

Top 0.1%

26.5%

Show abstract

Protein structures are crucial for understanding their biological activities. Since the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there is an urgent need to understand the biological behavior of the virus and provide a basis for developing effective therapies. Since the proteome of the virus was determined, some of the protein structures could be determined experimentally, and others were predicted via template-based modeling approaches. However, tertiary structures for several proteins are still not available from experiment nor they could be accurately predicted by template-based modeling because of lack of close homolog structures. Previous efforts to predict structures for these proteins include efforts by DeepMind and the Zhang group via machine learning-based structure prediction methods, i.e. AlphaFold and C-I-TASSER. However, the predicted models vary greatly and have not yet been subjected to refinement. Here, we are reporting new predictions from our in-house structure prediction pipeline. The pipeline takes advantage of inter-residue contact predictions from trRosetta, a machine learning-based method. The predicted models were further improved by applying molecular dynamics simulation-based refinement. We also took the AlphaFold models and refined them by applying the same refinement method. Models based on our structure prediction pipeline and the refined AlphaFold models were analyzed and compared with the C-I-TASSER models. All of our models are available at https://github.com/feiglab/sars-cov-2-proteins.

10

Hotspot coevolution at protein-protein interfaces is a key identifier of native protein complexes

Mishra, S.; Cooper, S. J.; Parks, J. M.; Mitchell, J. C.

2019-07-10 bioinformatics 10.1101/698233 medRxiv

Top 0.1%

26.4%

Show abstract

Protein-protein interactions play a key role in mediating numerous biological functions, with more than half the proteins in living organisms existing as either homo- or hetero-oligomeric assemblies. Protein subunits that form oligomers minimize the free energy of the complex, but exhaustive computational search-based docking methods have not comprehensively addressed the protein docking challenge of distinguishing a natively bound complex from non-native forms. In this study, we propose a scoring function, KFC-E, that accounts for both conservation and coevolution of putative binding hotspot residues at protein-protein interfaces. For a benchmark set of 53 bound complexes, KFC-E identifies a near-native binding mode as the top-scoring pose in 38% and in the top 5 in 55% of the complexes. For a set of 17 unbound complexes, KFC-E identifies a near-native pose in the top 10 ranked poses in more than 50% of the cases. By contrast, a scoring function that incorporates information on coevolution at predicted non-hotspots performs poorly by comparison. Our study highlights the importance of coevolution at hotspot residues in forming natively bound complexes and suggests a novel approach for coevolutionary scoring in protein docking.\n\nAuthor SummaryA fundamental problem in biology is to distinguish between the native and non-native bound forms of protein-protein complexes. Experimental methods are often used to detect the native bound forms of proteins but, are demanding in terms of time and resources. Computational approaches have proven to be a useful alternative; they sample the different binding configurations for a pair of interacting proteins and then use an heuristic or physical model to score them. In this study we propose a new scoring approach, KFC-E, which focuses on the evolutionary contributions from a subset of key interface residues (hotspots) to identify native bound complexes. KFC-E capitalizes on the wealth of information in protein sequence databases by incorporating residue-level conservation and coevolution of putative binding hotspots. As hotspot residues mediate the binding energetics of protein-protein interactions, we hypothesize that the knowledge of putative hotspots coupled with their evolutionary information should be helpful in the identification of native bound protein-protein complexes.

11

Understanding structural and functional diversity of ATP-PPases using protein domains and functional families in CATH database

Waman, V.; Yin, J.; Sen, N.; Firdaus-Raih, M.; Lam, S. D.; Orengo, C.

2023-10-16 bioinformatics 10.1101/2023.10.12.562014 medRxiv

Top 0.1%

22.9%

Show abstract

ATP-Pyrophosphatases (ATP-PPases) are the most primordial lineage of the large and diverse HUP (HIGH-motif proteins, Universal Stress Proteins, ATP-Pyrophosphatase) superfamily. There are four different ATP-PPase substrate-specificity groups, and members of each group show considerable sequence variation across the domains of life despite sharing the same catalytic function. Over the past decade, there has been a >20-fold expansion in the number of ATP-PPase domain structures most recently from advances in protein structure prediction (e.g. Alphafold2). Using the enriched structural information, we have characterised the two most populated ATP-PPase substrate-specificity groups, the NAD-synthases (NAD) and GMP synthases (GMPS). We performed local structural and sequence comparisons between the NADS and GMPS from different domains of life and identified taxonomic-group specific structural functional motifs. As GMPS and NADS are potential drug targets of pathogenic microorganisms including Mycobacterium tuberculosis, structural motifs specific to bacterial GMPS and NADS provide new insights that may aid antibacterial-drug design.

12

The Potential for SARS-CoV-2 to Evade Both Natural and Vaccine-induced Immunity

Shang, E.; Axelsen, P.

2020-12-13 immunology 10.1101/2020.12.13.422567 medRxiv

Top 0.1%

22.9%

Show abstract

SARS-CoV-2 attaches to the surface of susceptible cells through extensive interactions between the receptor binding domain (RBD) of its spike protein and angiotensin converting enzyme type 2 (ACE2) anchored in cell membranes. To investigate whether naturally occurring mutations in the spike protein are able to prevent antibody binding, yet while maintaining the ability to bind ACE2 and viral infectivity, mutations in the spike protein identified in cases of human infection were mapped to the crystallographically-determined interfaces between the spike protein and ACE2 (PDB entry 6M0J), antibody CC12.1 (PDB entry 6XC2), and antibody P2B-2F6 (PDB entry 7BWJ). Both antibody binding interfaces partially overlap with the ACE2 binding interface. Among 16 mutations that map to the RBD:CC12.1 interface, 11 are likely to disrupt CC12.1 binding but not ACE2 binding. Among 12 mutations that map to the RBD:P2B-2F6 interface, 8 are likely to disrupt P2B-2F6 binding but not ACE2 binding. As expected, none of the mutations observed to date appear likely to disrupt the RBD:ACE2 interface. We conclude that SARS-CoV-2 with mutated forms of the spike protein may retain the ability to bind ACE2 while evading recognition by antibodies that arise in response to the original wild-type form of the spike protein. It seems likely that immune evasion will be possible regardless of whether the spike protein was encountered in the form of infectious virus, or as the immunogen in a vaccine. Therefore, it also seems likely that reinfection with a variant strain of SARS-CoV-2 may occur among people who recover from Covid-19, and that vaccines with the ability to generate antibodies against multiple variant forms of the spike protein will be necessary to protect against variant forms of SARS-CoV-2 that are already circulating in the human population.

13

Bridging functional annotation gaps in non-model plant genes with AlphaFold, DeepFRI and small molecule docking

Stephan, G.; Dugdale, B.; Deo, P.; Harding, R.; Dale, J.; Visendi, P.

2021-12-23 bioinformatics 10.1101/2021.12.22.473925 medRxiv

Top 0.1%

22.8%

Show abstract

BackgroundFunctional annotation assigns descriptive biological meaning to genetic sequences. Limited availability of manually curated or experimentally validated plant genes from a diverse range of taxa poses a significant challenge for functional annotation in non-model organisms. Accurate computational approaches are required. We argue that recent breakthroughs in deep learning have the potential to not only narrow the functional annotation gap between non-model and model plant organisms, but also annotate and reveal novel functions even for genes with no homologs in public databases. ResultsDeep learning models were applied to functionally annotate a set of previously published differentially expressed genes. Predicted protein structures and functional annotations were generated using the AlphaFold protein structure and DeepFRI protein language inference models respectively. The resulting structures and functional annotations were validated using small molecule docking experiments. DeepFRI and AlphaFold models not only correctly annotated differentially expressed genes, but also revealed detailed mechanisms involving protein-protein interactions. ConclusionsDeep learning models are capable of inferring novel functions and achieving high accuracy in functional annotation. Their increased use in plant research will result in major improvements in annotations for non-model plants that are underrepresented in genome databases. We illustrate how integrating protein structure prediction, functional residue prediction, and small molecule docking can infer plausible protein-protein interactions and yield additional mechanistic insights. This approach will aid in the selection of candidate genes for further study from differential expression studies that generate large gene lists.

14

Amino acid characteristics in protein native state structures

Skrbic, T.; Giacometti, A.; Hoang, T. X.; Maritan, A.; Banavar, J. R.

2023-12-13 biophysics 10.1101/2023.12.12.571261 medRxiv

Top 0.1%

22.5%

Show abstract

We present a geometrical analysis of the protrusion statistics of side chains in more than 4,000 high-resolution protein structures. We employ a coarse-grained representation of the protein backbone viewed as a linear chain of C atoms and consider just the heavy atoms of the side chains. We study the large variety of behaviors of the amino acids based on both rudimentary structural chemistry as well as geometry. Our geometrical analysis uses a backbone Frenet coordinate system for the common study of all amino acids. Our analysis underscores the richness of the repertoire of amino acids that is available to nature to design protein sequences that fit within the putative native state folds.

15

Common substructures and sequence characteristics of sandwich-like proteins from 42 different folds

Kister, A.

2020-05-27 bioinformatics 10.1101/2020.05.27.108969 medRxiv

Top 0.1%

22.5%

Show abstract

This study addresses the following fundamental question: Do sequences of protein domains with sandwich architecture have common sequence characteristics even though they belong to different superfamilies and folds? The analysis was carried out in two stages: determination of substructures in the domains that are common to all sandwich proteins; and detection of common sequence characteristics within the substructures. Analysis of supersecondary structures in domains of proteins revealed two types of four-strand substructures that are common to sandwich proteins. At least one of these common substructures was found in proteins of 42 sandwich-like folds (as per structural classification in the CATH database). Comparison of the sequence fragments corresponding to strands that make up the common substructures revealed specific rules of distribution of hydrophobic residues within these strands. These rules can be conceptualized as grammatical rules of beta protein linguistics. Understanding of the structural and sequence commonalities of sandwich proteins may also be useful for rational protein design.

16

AlphaFold2 predicts interactions amidst confounding structural compatibility

Martin, J.

2023-08-27 bioinformatics Community evaluation 10.1101/2023.08.25.554771 medRxiv

Top 0.1%

22.4%

Show abstract

Predicting physical interactions is one of the holy grails of computational biology, galvanized by rapid advancements in deep learning. AlphaFold2, although not developed with this goal, seems promising in this respect. Here, I test the prediction capability of AlphaFold2 on a very challenging data set, where proteins are structurally compatible, even when they do not interact. AlphaFold2 achieves high discrimination between interacting and non-interacting proteins, and the cases of misclassifications can either be rescued by revisiting the input sequences or can suggest false positives and negatives in the data set. Alphafold2 is thus not impaired by the compatibility between protein structures and has the potential to be applied at large scale.

17

A common network of residue-residue contacts underlies peptides' interactions with MHC class II complex

Kister, A. E.; Kister, I.

2025-03-25 immunology 10.1101/2025.03.22.644772 medRxiv

Top 0.1%

22.2%

Show abstract

The formation of a stable peptide-MHC class II complex is a critical step in the adaptive immune response. In this work, we investigate the residue-residue contacts that anchor the peptide between the alpha and beta chains of MHC II and examine whether such anchoring residue-residue contacts are shared among different peptide-MHC II complexes. We hypothesize that there is a similarity between the map of contacts of antigenic peptides with the alpha and beta chains of MHC II and the map of contacts of the "natural" complex of MHC II with the CLIP - the fragment of the gamma chain. Thus, the CLIP-MHC II complex - specifically, PDB structure 3PDO - was taken as the prototype for peptide-MHC II interaction. To compare the contact maps between the prototype structure and antigenic peptides/MHC II in 14 crystal structures, we developed a unified numbering system for residues in peptide-MHC II complexes. Using this unified residue numbering system, we show that approximately half of the CLIP-MHC II residue-residue contacts have analogs in structures that involve different antigenic peptides and different MHC II (HLA-DR, HLA-DQ, and mouse A/B) alpha and beta chains. We present here this common network of contacts that underlies peptide/MHC class II interactions, as well as the structural and physicochemical characteristics of these contacts. Based on these shared characteristics, we propose criteria for the specificity of antigenic peptide loading into MHC II, whereby one can predict whether a particular peptide fragment will bind to MHC II as well as the likely localization of the fragment within the peptide binding groove of MHC II.

18

Assessment of Protein Complex Predictions in CASP16: Are we making progress?

Zhang, J.; Yuan, R.; Kryshtafovych, A.; Kretsch, R. C.; Schaeffer, R. D.; Zhou, J.; Das, R.; Grishin, N. V.; Cong, Q.

2025-05-30 biophysics 10.1101/2025.05.29.656875 medRxiv

Top 0.1%

22.1%

Show abstract

The assessment of oligomer targets in the Critical Assessment of Structure Prediction Round 16 (CASP16) suggests that complex structure prediction remains an unsolved challenge. More than 30% of targets, particularly antibody-antigen targets, were highly challenging, with each group correctly predicting structures for only about a quarter of such targets. Most CASP16 groups relied on AlphaFold-Multimer (AFM) or AlphaFold3 (AF3) as their core modeling engines. By optimizing input MSAs, refining modeling constructs (using partial rather than full sequences), and employing massive model sampling and selection, top-performing groups were able to significantly outperform the default AFM/AF3 predictions. CASP16 also introduced two additional challenges: Phase 0, which required predictions without stoichiometry information, and Phase 2, which provided participants with thousands of models generated by MassiveFold (MF) to enable large-scale sampling for resource-limited groups. Across all phases, the MULTICOM series and Kiharalab emerged as top performers based on the quality of their best models per target. However, these groups did not have a strong advantage in model ranking, and thus their lead over other teams, such as Yang-Multimer and kozakovvajda, was less pronounced when evaluating only the first submitted models. Compared to CASP15, CASP16 showed moderate overall improvement, likely driven by the release of AF3 and the extensive model sampling employed by top groups. Several notable trends highlight key frontiers for future development. First, the kozakovvajda group significantly outperformed others on antibody-antigen targets, achieving over a 60% success rate without relying on AFM or AF3 as their primary modeling framework, suggesting that alternative approaches may offer promising solutions for these difficult targets. Second, model ranking and selection continue to be major bottlenecks. The PEZYFoldings group demonstrated a notable advantage in selecting their best models as first models, suggesting that their pipeline for model ranking may offer important insights for the field. Finally, the Phase 0 experiment indicated reasonable success in stoichiometry prediction; however, stoichiometry prediction remains challenging for high-order assemblies and targets that differ from available homologous templates. Overall, CASP16 demonstrated steady progress in multimer prediction while emphasizing the urgent need for more effective model ranking strategies, improved stoichiometry prediction, and the development of new modeling methods that extend beyond the current AF-based paradigm.

19

A simple method for computationally unstructuring proteins: some findings

Powell, A.

2026-03-03 biophysics 10.1101/2024.11.10.622713 medRxiv

Top 0.1%

22.0%

Show abstract

A methodology for computationally unstructuring proteins is described and the results of its application to a variety of proteins analyzed and discussed. Some proteins prove more susceptible than others, and fold topology plays a part in this. Alpha helical structure is found to be generally somewhat robust, and, perhaps unsurprisingly, unstructuring often begins at exposed chain termini. Phosphofructokinase-1 and phosphofructokinase-2, which have similar sizes but different fold topologies, are found to differ markedly in their unstructuring behaviour.

20

SPOT-1D-LM: Reaching Alignment-profile-based Accuracy in Predicting Protein Secondary and Tertiary Structural Properties without Alignment.

Singh, J.; Paliwal, K.; Singh, J.; Zhou, Y.

2021-10-16 bioinformatics 10.1101/2021.10.16.464622 medRxiv

Top 0.1%

21.9%

Show abstract

Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a combination of traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) allows a leap in accuracy over single-sequence based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers. This large improvement leads to an accuracy comparable to or better than the current state-of-the-art techniques for predicting these 1D structural properties based on sequence profiles generated from multiple sequence alignments. The high-accuracy prediction in both secondary and tertiary structural properties indicates that it is possible to make highly accurate prediction of protein structures without homologous sequences, the remaining obstacle in the post AlphaFold2 era.